Exploring pronunciation variants for Romanian speech-to-text transcription

نویسندگان

  • Ioana Vasilescu
  • Bianca Vieru-Dimulescu
  • Lori Lamel
چکیده

Speech processing tools were applied to investigate morpho-phonetic trends in contemporary spoken Romanian, with the objective of improving the pronunciation dictionary and more generally, the acoustic models of a speech recognition system. As no manually transcribed audio data were available for training, language models were estimated on a large text corpus and used to provide indirect supervision to train acoustic models in a semi-supervised manner. Automatic transcription errors were analyzed in order to gain insights into language specific features for both improving the current performance of the system and to explore linguistic issues. Two aspects of the Romanian morpho-phonology were investigated based on this analysis: the deletion of the masculine definite article -l and the secondary palatalization of plural nouns and adjectives and of 2 person indicative of verbs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating proper name pro for automatic speech

Generating correct pronunciation of proper names remains one of the most difficult tasks in text-to-phoneme transcription. Although phonetic rules can be efficient in processing proper names of one language, foreign family names cannot be always correctly generated without additional pronunciation rules. The present study addresses the problem of pronunciation variants for French and foreign fa...

متن کامل

Experimental detection of vowel pronunciation variants in Amharic

The pronunciation lexicon is a fundamental element in an automatic speech transcription system. It associates each lexical entry (usually a grapheme), with one or more phonemic or phone-like forms, the pronunciation variants. Thorough knowledge of the target language is a priori necessary to establish the pronunciation baseforms and variants. The reliance on human expertise can pose difficultie...

متن کامل

Detailed pronunciation variant modeling for speech transcription

Modeling pronunciation variants is an important topic for automatic speech recognition. This paper investigates the pronunciation modeling at the lexical level, and presents a detailed modeling of the probabilities of the pronunciation variants. The approach is evaluated on the French ESTER2 corpus, and a significant word error rate reduction is achieved through the use of context and speaking ...

متن کامل

Automatic Generation of Phon for Large Speech C

We describe a method for the automatic production of phonetic transcriptions in large speech corpora. First, we focus on the application of different techniques for the generation of pronunciation variants. Then, we explain the application of a speech recognition system for selecting the acoustically best matching phonetic transcription. The system is evaluated on different test sets selected f...

متن کامل

Pronunciation Variants Across Systems, Languages and Speaking Style

This contribution aims at evaluating the use of pronunciation variants across different system configurations, languages and speaking styles. This study is limited to the use of variants during speech alignment, given an orthographic transcription and a phonemically represented lexicon, thus focusing on the modeling abilities of the acoustic word models. Parallel and sequential variants are tes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014